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. Choosing  a Regression  Model  for  Forecastini 
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Let  Y be  a variable  (representing  hourly  electricity  demand,  daily 
births,  weekly  money  supply,  monthly  retail  sales  of  department  stores, 
quarterly  nonform  inventory  investment,  annual  gross  national  product,  and 
so  on)  whose  value  denoted  Y(t)  , at  time  t one  desires  to  forecast  (predict) 
or  explain.  Either  aim  is  accomplished  by  means  of  a decomposition  of  the  value 
Y(t)  into  the  sum  of  two  components  as  follows: 

Y(t)  = YU(t)  + YV(t) 

Y^(t)  is  the  explained  or  predictable  part  of  Y(t) 

YV(t)  is  the  error,  or  unexplained,  or  unpredictable  part  of 
Y(t) 

The  explained  part  r (t)  is  usually  a linear  function  of  "explanatory" 
variables  X^(t)  ^(t)  , . . . with  coefficients  denoted  p^^,...  5 explicitly, 

Y^(t)  = P1X1(t)  + ...  + PjX^t)  + ...  + PkXk(t)  + ... 


We  write  the  sum  as  possibly  infinite  series,  because  in  theory  there  is  an 
infinite  number  of  possible  explanatory  variables.  However, only  a finite 
number  of  explanatory  variables  X^  are  expected  to  have  coefficients  p^ 
which  are  different  enough  from  zero  that  the  benefit  (in  mean  square  error 
terms)  of  estimating  is  to  be  preferred  to  the  cost  of  considering  p^ 

to  be  equal  to  0 , and  thus  omitting  X^  from  the  model. 


I 


I 
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The  error  term  Yv(t)  Is  best  regarded  as  the  residual  Y(t)  - Y^(t) 
after  constructing  Y^(t)  to  explain  as  much  as  possible  of  the  value  of 
Y(t)  . We  call  Yv(t)  the  Innovation  at  time  t . 

We  use  the  Greek  letter  nu  as  a superscript  to  indicate  that  Yv(t) 
represents  what's  "new"  (or  "transitory")  in  Y(t)  after  explaining  as  much 
of  its  value  as  possible  by  the  best  available  explanatory  variables 
X^(t) , . . . ,X^(t)  . We  use  the  Greek  letter  as  a superscript  because  it 
connotes  a "mean,"  and  Y^(t)  connotes  an  average  or  smooth  value  about  which 
Y(t)  fluctuates. 

It  is  customary  to  denote  the  error  term  by  e(t)  , and  write  the 
model  for  Y(t)  as  a regression  model 


Y(t)  - BjX^t)  + ..  + P^U)  + €(t)  , t - 1.2,. .,n 

The  "ordinary"  or  "naive"  regression  model  assumes  that  the 

errors  e(t)  are  independent  identically  distributed  random  variables,  each 

2 

with  mean  zero  and  variance  c . The  best  estimators  of  B, B, 

1 k 

(according  to  such  criteria  as  least  squares,  maximum  likelihood,  or  minimum 

A A 

variance  unbiased),  denoted  Bjf-tP^  . in  the  model  (now  written  in  matrix 
form) 

Y - XB  + € 


are  the  solutions  of  the  normal  equations 


Pj_  X1(l)  ...  ^(1)  Y(l)  «(1) 

0“  ...  ) X ...  ...  ...  t Y — ...  j ^ ... 

0.  X(l)...X(k)  Y(n)  «<n) 


where 

A = X(x'x)"1  x' 

i 

A is  called  (see  Hoaglin  and  Welch  (1976))  the  hat  matrix;  I call  I -A 
the  whitening  matrix.  The  residual  sum  of  squares  can  be  represented 
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L |YV(t)|2  = ||YV||2  = ||(I -A)  Y||2 

t=l 

The  sample  multiple  correlation  coefficient  of  Y given  X is  defined 

by  1 - = ||(I  - A)  Y|| 2 + ||Y||2  . 

Given  the  regression  model  Y = xp  + c one  may  distinguish  three 
problems:  estimate 

(i)  XR  = Y*”1  , smoothing  or  forecasting; 

(ii)  P , parameter  estimation, 

(iii)  X , model  identification. 

We  have  described  the  solution  to  the  parameter  estimation  problem 
when  the  model  matrix  X is  fixed,  but  this  solution  is  regarded  as  unsatis- 
factory on  various  criteria;  Dempster,  Schatzoff,  and  Wermuth  (1977)  compare 
alternative  solutions,  including  subset  regression  (choose  an  optimal  submatrix 
lXj  : Xj  : . . . : X j ] of  variables  on  which  to  regress  Y ) and  ridge 

12  Jp 

regression  (estimate  p by 

P = (X'X  + Xl)-i  x'Y 

for  a suitable  choose  of  ridge  parameter  \ ).  Which  procedure  to  use  in 
practice  is  best  determined  adaptively  from  the  data  rather  than  a priori  on 
the  basis  of  theoretical  considerations.  I believe  recent  research  by  Wahba  (1976) 
provides  insight  into  a criterion  which  can  be  used  to  choose  the  optimal 
regression  model  and  parameter  estimator.  To  each  model  and  estimator  one 


1 


, 

I 

can  associate:  (1)  a hat  matrix  A [for  subset  regression  it  is 
“ X<p)(X(p)X(p)}  ^ X(#p)  ^or  ridge  regression  it  is 

* X(X*X  + XD  ^ X#  ] and  (2)  a "cross-validation"  criterion 

CV(A)  - 'It1'*)  ^ 

{Trace  (I  - A)] 

A A 

The  optimum  smoother  = AY  corresponds  to  the  hat  matrix  A , defined  as  the 
matrix  minimising  CV(A)  over  the  family  of  hat  matrices  A one  is  considering. 

The  justification  for  this  assertion  is  partly  its  successful  applica- 
tion in  practice  and  partly  a variety  of  theoretical  properties.  Its  justifica- 
tion is  not  the  aim  of  this  paper;  rather  we  seek  only  to  note  the  existence 
of  criteria  for  selecting  a regression  model  in  order  to  motivate  our  approach 
toward  finding  criteria  for  selecting  a time  series  model. 

2 . Identifying  a Stationary  Time  Series  Model 

Let  Y(t)  , t = 0, ± 1,...  be  a normal  stationary  time  series.  Then  the 
explanatory  variables  to  be  used  to  explain  or  predict  Y(t)  are  past  values 
X^(t)  = Y(t  - lj.X^Ct)  = Y(t-2),...  of  which  in  theory  there  are  an 
infinite  number.  In  the  decomposition  Y(t)  = Y^(t)  + Yv(t)  , Y^(t)  is 

defined  to  be  the  conditional  expectation  of  Y(t)  given  the  infinite  past 
Y(t  - 1 ) , Y(t  - 2),...  : 


Y^t)  - E[Y(t)|Y(t  - l),Y(t  - 2),...]  , 


called  the  infinite  memory  one  step  ahead  predictor.  The  model  identification 
problem  then  corresponds  to  finding  the  memory  m such  that  the  finite  memory 
predictor 


YM,ra(t)  - E[Y(t)|Y(t  - 1) Y(t -m] 

performs  as  well  as  the  infinite  memory  predictor,  or  more  precisely  Y^(t) 
and  Y^,m(t)  almost  coincide. 

In  the  theory  of  time  series  analysis  it  is  useful  to  define  an  ideal  time 
series  model  by  the  condition  Y^ft)  and  Y^,m(t)  exactly  coincide;  we 
define  the  time  series  Y(«)  to  be  an  autoregressive  scheme  of  order  m if 

YM(t)  - YU,m(t) 


or  equivalently 


YV,m( t ) - Y(t)  - YM,m(t) 


is  white  noise  in  the  sense  that 


g^Yv,m(8)  yV,m(t))  - 0 


for  s f-  t 


The  coefficients  in  the  representation  of  Y^’m(t)  as  a linear  combination  of 

Y(t  - J)  , J ■ l,..,m  are  denoted  by  -a.  , where  a symbolises  "autoregressive. 

J ,m 


Thus  we  write 
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.Y^,m(t)  - a,  V(t  - 1)  + . . + a Y(t  - m) 
1 ,m  m ,m 


Define  the  backward  shift  operator  L by 


LY(t)  - Y(t  - 1) 


(which  means  the  same  as  B in  the  notation  of  Box  and  Jenkins).  Finally 
define  the  polynomial 


" 1 +al,mS  + *•  + V./1 


Then 


YV,m(t)  - Y (t ) - YM,m(t) 


**  ^(L)  Y(t) 

Similarly  we  define  infinite  memory  (i)  predictions,  (ii)  autoregressive 
transfer  function,  and  (iii)  innovations: 


(U 


(ID 


(iii) 


YU(t)  - ttl  JUt  - 1)  + ..  + roY(t  -m)  + 


8»(‘>  ” 1 +®i,.«  + •••  + v-*"  + 


YV(t)  - g (L)  Y(t) 


8 


For  future  reference  define  the  mean  square  prediction  errors 
°lm  Kl|YVm<t)|2]  . al  « El|YV(t)|2]  . 

We  call  g^fe)  the  autoregressive  transfer  function  (ARTF)  of  the 
stationary  time  series  Y(  • ) since  one  can  write  symbolically 


t ime 


series  Y(*)  H *» 


— Cl •)  white  noise  . 


In  words,  gw(L)  is  the  whitening  filter. 

The  time  series  (model  identification')  problem  can  be  defined  to  be 
the  e s t ima t i on  o f g^  ; this  is  equivalent  to  defining  the  regression  modeling 
problem  as  estimation  of  the  "optimal"  hat  matrix  A . 


We  will  attempt  to  clarify  the  role  of  finite  parameter  schemes,  such 
as  AR  (.autoregressive')  schemes  of  order  p : 


g^(U  Y(t ) - git)  ; 


MA  (moving  average)  schemes  of  order  q 


Y(t)  - h^(L)  «(t>  . hq(t)  - 1 + gjS  + ..  + t^rq  ; 
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ARMA  (autoregressive-moving  average)  schemes  of  order  (p,q)  : 


gp(L)  Y(t)  = hq(L)  e(t)  . 


To  interpret  the  polynomials  8p(z)  and  hq(z)  parametrizing  an  ARMA  scheme, 
one  uses  them  to  form  the  ARTF: 


g«,(z)  = hq^z^  6p(z) 


The  assumption  of  a parsimonious  ARMA  scheme  is  adopted  to  provide  a 
parsimonious  finite  parametric  representation  of  in  order  to  enable  it 

to  be  estimated.  However,  one  can  estimate  g^  non-parametrically  using 
approximating  AR  schemes.  It  is  my  view  that  in  modeling  stationary  time 
series  only  AR  schemes  are  needed.  However,  it  will  be  shown  in  the  next 
section  that  ARMA  schemes  are  useful  for  modeling  non- stationary  time  series. 

*2  * ^ 

To  form  estimators  a »CL,  » • • • ,a_  of  the  parameters  of  an  AR  scheme 

m I ,m  ni,in 

of  order  m we  use  the  Yule-Walker  equations:  for  j = l,...,m 

A A A A A 

p(-v)  + a1>rap(l  " v)  + + VmP^"  V>  * 0 

where 

X“  v X 

P(v)  = E Y(t)  Y (t  + v)  -r  E Y2(t) 
t=l  t=l 

is  an  estimator  of  p(v)  = coir  ( Y(t) ,Y(t  +v))  . Further 


I 
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A A A A 


om  - p(0)  +a1>mp(l)  + ...  +akn>mp(m) 


It  seems  plausible  that  there  is  a value  of  m , denoted  m , such  that 


Z)  ' 1 **1,.'  + ••••  +tV,mS 


m 


for  m * m is  an  ’'optimal'1  estimator  of  g ; in  symbols. 


8co(z) 


g~(z) 

m 


One  can  show  that  approximately  the  overall  mean  square  percentage 

A 

error  of  any  g^  as  an  estimator  of  satisfies 


„TT 


_L 

2TT 


-ti 


"-2  A , ini,  -2  , iui) 

nm  «ttle  > ' «.(e 


-2  , iuu 

°co  8»(e  > 


du) 


m 


1 Z -2  . -2  -2 

t E o.  + cr  - am 

T j-1  J m 


where  T is  the  sample  sire.  Therefore  in  practice  we  choose  m to  minimize 
a criterion  function  CAT(m)  , defined  as  follows: 


CAT(O)  - 


-d  + p 


1 ™ S-2  R- 2 

T “ " CJ 

T j-i  J 


CAT(m) 


w..uew^. 
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where  o is  an  unbiased  estimator  of  n defined  by 
m m * 


m 


(L  • t)  °m  • 


When  m “ 0 we  accept  the  hypothesis  that  the  observed  time  series  is  white 
noise. 

A 

Having  determined  the  maximum  order  m of  the  approximating  autoregres- 

A 

sive  transfer  function  g„(s)  we  next  use  stepwise  regression  techniques  to 

m 

determine  the  significantly  non-sero  autoregressive  coefficients  in  the 
transfer  function.  As  an  example,  on  monthly  data  if  one  determined  that 

A A 

m ■ 13  , it  would  be  of  interest  to  determine  whether  g^(z)  were  approxi- 
mately of  the  form 


g13(z)  » (i  - e^xi  - P12r12)  . 

Stepwise,  or  subset,  autoregression  is  discussed  by  McClave  (1975). 

3 . Identifying  a Non-Statlonary  Time  Series  Model 

For  a non- stationary  time  scries,  the  modeling  problem  is  not  only  to 
find  the  whitening  filter  (which  transforms  fY ( t ) } to  {Yv(t)}  , but 
Interpret  it  as  several  filters  in  series: 

Dq  : a detrending  filter  which  in  the  spectral  domain  eliminates  the 

low  frequency  components  corresponding  to  trend, 

: a de-seasonal  filter  which  in  the  spectral  domain  eliminates 

the  components  corresponding  to  a periodic  component  with 
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period  X , or  to  the  harmonics  with  frequencies  which  are 

multiples  of  , 

A 

g or  n : an  innovations  filter  which  transforms  to  white  noise  the 

00 

( stat ) 

series  Yv  7(t)  = DgD^Y(t)  representing  a transformation 
of  Y(t)  to  a stationary  series. 

The  time  series  modeling  problem  is  thus  to  find  the  filter 
representation 


detrend 

deseasonal 

innovations 

Y(t)  

Do 

\ 

... 

\ 

8« 

where  we  admit  the  possibility  of  several  different  periods  .....X^  (for 
example  in  monthly  data  \ values  are  often  12  and  3 , in  daily  data  \ 
values  are  often  7 and  365  , in  hourly  data  X values  are  often  24  and  168). 

Given  the  above  decomposition,  one  can  form  various  derived  series: 

Y^^(t)  = DgY(t)  the  detrended  series, 

Y(X)(t)  = D Y(t)  the  seasonally  adjusted  series, 

A 

Y(°.X)(t)  = Y(stat)(t)  = D0DjY(t)  * D^DQY(t)  , the  detrended 
seasonally  adjusted  series, 

Yv(t)  * Y^W^ite\t)  ■ g D-D.Y(t)  the  innovations  series. 

® 0 X 

Such  decompositions  seem  to  be  crucial  to  the  study  of  the  relations 
between  time  series  Y^(*)  and  1 To  8tudY  their  relations  it  seems 
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clear  that  if  one  relates  Y^(*)  and  Y^*)  without  filtering  one  will  often 

find  "spurious"  relationships.  It  has  been  suggested  therefore  that  one 

attempt  to  relate  Y^(»)  and  Y^(>)  , the  individual  innovations  of  each 

series.  What  remains  to  be  examined  is  the  insight  to  be  derived  from 

relating  Y^^(t)  and  Y^^t)  , the  seasonally-adjusted  series,  or  Y^Stat^(t) 

( s t at ) 

and  Y^  '(t)  , the  detrended  and  seasonally  adjusted  series. 

The  question  remains  of  how  to  find  in  practice  the  detrending  and 
deseasonal  filters.  To  seasonally  adjust  for  a period  X in  data,  several 
possibilities  are  available  which  may  be  interpreted  as  seasonal  adjustment 
filters . 

A filter  with  the  same  zet-jes  in  the  frequency  domain  as  some  usual 
procedures,  which  is  recursive  (acts  only  on  past  values),  and  yields  a 
variety  of  filter  shapes  (in  the  frequency  domain)  between  a square  wave  and 
a sinusoid  is  the  one-parameter  family  of  filters 

I-LX 

D,<9)  - ■1—LT  , 

K I - 0LA 

where  the  parameter  0 is  chosen  (usually  by  an  estimation  procedure)  between 

0 and  1 . When  0*0  the  filter  is  denoted  7 and  called  X_th  difference. 

K 

To  understand  the  role  of  the  filter  D (0)  , denote  it  for  brevity  by 

K 

D and  rewrite  it  as  follows;  writing  I - LX  = 1 - 0LX-  (1  - 0)  LX  we  obtain 

D * x - ^ T SL.jf}.  = 1 - {1  - 0)(LX  + 0L2X  + 02L3X  + ...)]  . 

1 - 0L* 
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Then  the  output  Y*  J(t)  m DY(t)  of  a filter  D with  input  Y(t)  can  be 


written 


YW(t)  - Y(t)  - (1- 8){Y(t -2X)  + ...  ] 


In  words,  Y'  ^(t)  is  the  result  of  subtracting  from  Y(t)  the  exponentially 
weighted  average  of  Y(t  - \),Y(t  - 2\) 


It  seems  to  me  open  to  investigation  whether  the  filter  D (of  mixed 
autoregressive-moving  average  type)  is  superior  to  the  approximately  equivalent 
autoregressive  filter 


D'  ■ (I  - LA)(I  + 9LA)  = I - (1  - 0)  LA-  QL 


whose  output  Y^  '(t)  = D#Y(t)  can  be  written 


'(t)  - Y(t)  - (1  - 0)  Y(t  - X)  - 9Y(t  - 2\)  . 


It  appears  to  me  that  the  role  of  moving  averages  in  Box-Jenkins  ARIMA 

model 8 is  to  build  filters  of  the  type  D (0)  . Thus  the  ARIMA  model 

A 


(I-LHI-L12)  Y(t)  = (I-  0^X1  - 012l12)  et 


should  be  viewed  as  the  whitening  filter 


W Dl2<612>  Y(t)  ' L Lu)  ,(t)  ■ 
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I should  like  to  emphasize  that  the  output  of  the  filter  D^(0^)  Di2^®12^ 
is  often  not  white  noise  but  is  only  a stationary  time  series.  For  purposes 
of  one-step  ahead  prediction,  it  is  often  not  important  to  differentiate 
between  the  case  that  D^D^YCt)  is  white  noise  or  not,  since  most  of  the 
predictability  is  obtained  by  finding  a suitable  transformation  to  stationarity 
of  the  form  (below  we  discuss  naive  prediction  as  a transformation  to 

stationarity). 

The  moral  to  be  drawn  from  the  foregoing  considerations  is  as  follows. 

To  find  a transformation  of  a non-stationary  time  series  to  stationarity  it 

12 

may  suffice  to  apply  pure  differencing  operators  such  as  I - L and  I - L 

However,  the  transformation  of  the  residuals  to  the  innovations  series  should 

be  expressed  if  possible  in  terms  of  factors  corresponding  to  the  filters 

12 

I - 0^L  and  I - since  such  factors  enable  us  to  interpret  the  overall 

whitening  filter 


Y(t) 


as  a series  of  filters 


Y(t) 


Detrend 

Deseasonal 

Innovations 

; 

Filter 

Filter 

Filter 

«t 

which  can  be  interpreted  as  helping  to  provide  solutions  to  the  seasonal 
adjustment  problem. 
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Naive  Prediction  and  Transformations  to  Stat lonarltv : To  predict  a time 
series  Y(t)  one  can  often  suggest  a "naive"  predictor  of  the  form 

Ynaive(t)  _ Y(t  . ^ + Y(t  . ^ . Y(t  - - Xj)  . 

The  prediction  error  of  this  predictor  is  given  by 

Y(t)  - Y(t)  - Ynaive(t)  - Y(t)  - Y(t  - \j)  * Y(t  - Xj)  +Y(t  " ^ - Xj)  - (I  - L^HI  - lS  Y(t)  . 

In  words,  taking  and  X2_t^  differences  is  equivalent  to  forming  the 

naive  prediction  errors. 

A criterion  that  Y(t)  be  non- stationary  is  that  it  be  predictable  (in 
the  sense  that  the  ratio  of  the  average  square  of  Y(t)  to  the  average  square 
of  Y(t)  is  of  the  order  of  1/T  ).  When  Y(t)  is  stationary  (non-predictable) 
one  models  it  by  an  approximate  autoregressive  scheme, 

gA(L)  Y(t)  - G.. 
m 6 

which  can  be  used  to  form  Y^(t)  , the  best  one-step  ahead  predictor  of  Y(t)  . 

The  best  one-step  ahead  predictor  of  Y(t)  is  given  by 

YU(t)  - Ynalv*(t)  +YM(t)  ; 

to  prove  this,  note  the  identity 
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Y(t)  - Ynalv®(t)  + Y(t) 


and  form  the  conditional  expectation  of  both  sides  of  this  identity  with 
respect  to  Y(t  - 1) , Y(t  - 2) , . . . 

A remarkable  fact  is  the  equality  of  the  prediction  errors  of  Y(t)  and 

Y(t)  : 

YV(t)  - Y(t)  - Y^(t)  - Y(t)  - Y^(t)  - Y V(t)  . 

It  follows  that  to  find  the  whitening  filter 


Y(t) 


for  a non- stationary  time  series  Y(t)  (which  includes  almost  all  time  series 
with  seasonal  components)  it  suffices  to  apply  any  one-sided  filter  (in  prac- 
tice either  suggested  by  an  ad  hoc  deseasonalizing  procedure  or  found  by 

applying  the  CAT  method  and  discovering  that  a a is  less  than  8/T  ) whose 
„ m 

output  Y(t)  is  stationary.  The  series  filter 


Y(t) 


Y(t) 


YV(t)  - c 


then  yields  the  whitening  filter.  While  the  filter  leading  to  Y(t)  is  not 
unique,  the  overall  filter  leading  to  cfc  is  unique. 

Ti\e  final  seasonal-adjustment  procedure  is  a filter  which  comes 

from  interpreting  tue  overall  whitening  filter  as  a series  of  filters  in 
series  which  can  be  Interpreted  as  detrending  and  deseasonalizing  filters. 
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4 . Illustrative  Example:  International  Airline  Passengers 


To  illustrate  our  approach  to  time  series  modeling  of  seasonal  data, 
let  us  consider  a series  (used  as  an  illustrative  example  by  Box  and  Jenkins 
(1970),  p.  305):  monthly  passenger  totals  (measured  in  thousands)  in  inter- 
national air  travel  1949-1960  which  we  denote  by  \’(t)  and  its  logarithms 
which  we  denote  by  Z(t)  . The  series  length  is  144  . 


The  model  fitted  by  Box  and  Jenkins  to  the  airline  data  is 


12, 


V’viZ(t)  ’ U-el**)(t-e12L  ) e(t)  ; 


in  words,  take  first  and  twelfth  differences  to  transform  to  a stationary 
time  series  which  is  modeled  as  a multiplicative  moving  average.  The  param- 


eters of  this  model  are  estimated  by  Box  and  Jenkins  to  be  0^  * .4  , 

2 


®12  ' '6  • 


.0013  . The  model  fitted  to  the  airline  data  by  Box  and 
Jenkins  can  be  written  in  our  notation 


Di<.4)  Dl2(.6)  Z(t>  - «(t)  . 0‘  - .0013 


In  our  approach,  one  has  a choice  of  first  steps. 

Choice  l:  Take  first  and  twelfth  differences  as  obvious  "naive"  pre- 
diction errors;  then  analyse 


Z(t)  - Dl2  Dt  Z(t) 


which  la  a time  series  of  length  131  . 
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: 


Choice  II:  To  determine  suitable  transformations  to  transform  from 
a non-stationary  time  series  Z to  a stationary  time  series  Z , examine  a 
best  approximating  autoregressive  scheme  whose  order  m is  determined  by 
CAT  and  whose  parsimonious  form  is  determined  by  subset  autoregression.  In 
this  way  one  might  be  led  (without  directly  examining  sample  autocorrelations) 
to  try 

Z(t)  * Z(t)  , first  differences, 

or 


Z(t)  * Z(t)  , twelfth  differences 


as  possible  transformations  to  stationarity . 


If  we  adopt  Choice  I,  we  note  first  that  Z(*)  has  variance 

R (0)  • .002  which  is  about  17.  of  the  variance  of  Z(*)  which  is  .194  . 
Z 

This  indicates  that  Z(*)  is  non-stationary.  Fitting  Z(»)  by  a suitably 

A 

long  autoregressive  scheme  with  order  m determined  by  CAT  , one  finds 

A 

m * 12  with  autoregressive  coefficients 


al 

= .36 

a? 

s 

.01 

CM 

d 

= .05 

a8 

s 

-.03 

a3 

= .15 

a9 

S 

-.16 

a4 

= .11 

aio 

= 

-.03 

a5 

= -.05 

all 

S 

.08 

a6 

- -.09 

tt12 

s 

.34 

i 

; i 
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2 

and  residual  variance  a£  ■ .0014  . The  best  stepwise  AR  scheme  fitted 
to  Z(t)  is  found  to  have  only  lags  1 and  12  : 

Z(t)  + .32  Z(t  - 1)  + .37  Z(t  - 12)  = €(t)  , a2e  = .0015 
2 

The  residual  variance  o£  of  a model  is  actually  computed  in  our  program 

as  a proportion  (here  .75  ) of  R^(0)  . 

Z 

Next  one  might  examine  the  form  of  ARMA  and  MA  schemes  that  fit 
Z(*)  . A best  fitting  ARMA  scheme  is 

Z(t)  + .34  Z(t  - 1)  = e(t)  - .39  e(t  - 12) 

2 

with  residual  variance  o ■ .73  R^(0)  = .0015  . We  would  regard  the  ARMA 

€ Z 

scheme  as  identical  to  the  AR  scheme.  If  one  forces  the  model  for  Z(t) 
to  be  of  the  MA  form 

Z(t)  = e(t)  + P1e(t  - 1)  + P12e(t  - 12)  + Pne(t  - 13) 

one  finds  that  (using  for  computational  speed  a statistically  ineffi- 
cient linear  algorithm)  * -.27  , P12  = -.38  , P^  **  .13  with  residual 

2 

variance  * .76  R^(0)  * .0015  . Note  that  = which  is  close 

Z 

enough  to  .13  that  one  could  conclude  a multiplicative  model 


Id 


Z(t)  - (1  - .3L)(1  - .4L)  G(t)  ; 
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this  multiplicative  model  for  Z(t)  enables  one  to  write,  in  agreement 
with  Box-Jenkins,  that 

d^.3)  d12(.4)  Z(t)  = e(t)  . 

If  we  adopt  Choice  II,  we  find  that  the  best  approximating  auto- 
regressive scheme  to  Z(t)  with  order  m determined  by  CAT  has  m = 13  , 
with  coefficients 


al 

- 

-1.00 

a7 

= 

.02 

a2 

= 

.09 

a8 

= 

.08 

a3 

= 

-.03 

a9 

= 

-.11 

a4 

= 

.03 

aio 

= 

.02 

tt5 

= 

-.02 

all 

s 

.07 

a6 

= 

1.00 

ai2 

-.45 

a13 

= 

00 

The  residual  variance  is  .06  (of  the  original  variance  .1935);  it  just 
about  equals  the  threshold  8/T  (here  T = 131  for  the  residual  series, 
and  8/T  =*  .06  ) below  which  we  consider  a residual  variance  indicates  pre- 

/v 

dictability  and  non-stationarity . Having  determined  m one  then  uses 
stepwise  autoregression  to  determine  a more  parsimonious  AR  scheme  of 

A 

order  m which  Includes  only  autoregressive  coefficients  significantly 
different  from  zero.  In  the  present  case,  one  would  find  only  lag  1 signi- 
ficant with  coefficient  .95  ; therefore  as  a Z series  representing  a 
transformation  of  Z from  non-stationarity  to  stationarity  we  might  choose 
first  differences:  Z(t)  “ Z(t)  - Z(t  - 1)  , which  has  residual  variance  .01 
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However,  it  seems  intuitively  more  meaningful  to  take  twelfth 
differences:  Z(t)  • Z(t)  - Z(t  - 12)  which  has  residual  variance  .0038  . 
The  innovation  series  ZV(t)  of  Z(*)  is  found  by  finding  the  innovation 
series  Z (•)  of  Z(*)  by  fitting  to  Z(*)  an  autoregressive  scheme 

A /v 

whose  order  we  now  denote  by  m . One  obtains  m = 13  (using  CAT)  with 
residual  variance  .00127  (comparable  to  .00134  obtained  by  Box  and 
Jenkins).  Using  stepwise  autoregression  on  Z one  discovers  lags  1 , 12  , 
13  have  significant  coefficients;  we  therefore  form  the  residuals  (with 
residual  variance  .0015  ) 


Zv(t)  = Z(t)  - .74  Z(t-l)  + .38  Z(t  - 12)  - .31  Z(t  - 13) 


- (I  - . 74  L) (I  + .38  L12)  Z(t) 


The  model  we  obtain  by  fitting  a parsimonious  autoregressive  scheme 
to  twelfth  differences  is 


(I  - .74  L)  (I  + .38  L12)(I  - L12)  Z(t)  = e(t)  . 


It  can  be  written  approximately 


I - L I - L 


12 


I - .26  L I - .38  L 


Z(t)  = e(t) 


which  is  similar  to  the  model  found  by  Box  and  Jenkins.  We  preft r to  write 
our  model 


Dj ( . 26)  D12(.38)  Z(t)  - e(t) 


' I 
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We  have  asserted  in  Section  3 a theorem  that  the  ideal  whitening 
filter  transforming  a non-statlonary  time  series  Y(«)  to  its  innovations 
YV(*)  is  unique  but  its  decompositions  for  purposes  of  interpretation  are 
not  unique.  In  practice,  one  may  appear  to  be  able  to  identify  several 
"different"  approximate  whitening  filters;  additional  experience  in  case 
studies  is  required  to  determine  how  to  measure  differences  and  equivalences 


of  whitening  filters. 
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